Overview

Dataset statistics

Number of variables13
Number of observations21897
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.2 MiB
Average record size in memory104.0 B

Variable types

Categorical2
Numeric8
Boolean3

Alerts

Ignore7 has constant value "False"Constant
TxnTime has a high cardinality: 914 distinct valuesHigh cardinality
Amount is highly overall correlated with Ignore5High correlation
Ignore3 is highly overall correlated with StoreNumberHigh correlation
Ignore5 is highly overall correlated with AmountHigh correlation
StoreNumber is highly overall correlated with Ignore3High correlation
SaleFlag is highly overall correlated with Ignore6High correlation
Ignore6 is highly overall correlated with SaleFlagHigh correlation
Quantity is highly overall correlated with Ignore1High correlation
Ignore1 is highly overall correlated with QuantityHigh correlation
Ignore5 is highly skewed (γ1 = 30.78199035)Skewed
Quantity has 386 (1.8%) zerosZeros
Ignore1 has 21511 (98.2%) zerosZeros
Amount has 1365 (6.2%) zerosZeros

Reproduction

Analysis started2023-03-03 16:44:30.191368
Analysis finished2023-03-03 16:44:58.737938
Duration28.55 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

StoreNumber
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size171.2 KiB
108
16670 
233
5227 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters65691
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row108
2nd row108
3rd row108
4th row108
5th row108

Common Values

ValueCountFrequency (%)
108 16670
76.1%
233 5227
 
23.9%

Length

2023-03-03T22:14:59.014063image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-03T22:14:59.373396image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
108 16670
76.1%
233 5227
 
23.9%

Most occurring characters

ValueCountFrequency (%)
1 16670
25.4%
0 16670
25.4%
8 16670
25.4%
3 10454
15.9%
2 5227
 
8.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 65691
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 16670
25.4%
0 16670
25.4%
8 16670
25.4%
3 10454
15.9%
2 5227
 
8.0%

Most occurring scripts

ValueCountFrequency (%)
Common 65691
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 16670
25.4%
0 16670
25.4%
8 16670
25.4%
3 10454
15.9%
2 5227
 
8.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 65691
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 16670
25.4%
0 16670
25.4%
8 16670
25.4%
3 10454
15.9%
2 5227
 
8.0%

ItemCode
Real number (ℝ)

Distinct8862
Distinct (%)40.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.2321057 × 109
Minimum83
Maximum9.7829 × 1011
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size171.2 KiB
2023-03-03T22:14:59.576524image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum83
5-th percentile1.3600001 × 109
Q13.0400774 × 109
median5.1500001 × 109
Q37.0784459 × 109
95-th percentile6.1029032 × 1010
Maximum9.7829 × 1011
Range9.7829 × 1011
Interquartile range (IQR)4.0383684 × 109

Descriptive statistics

Standard deviation1.9701736 × 1010
Coefficient of variation (CV)2.1340457
Kurtosis805.75392
Mean9.2321057 × 109
Median Absolute Deviation (MAD)1.9284479 × 109
Skewness18.331195
Sum2.0215542 × 1014
Variance3.8815839 × 1020
MonotonicityNot monotonic
2023-03-03T22:14:59.810882image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1600042060 115
 
0.5%
7078400628 115
 
0.5%
7078400620 111
 
0.5%
7078400624 95
 
0.4%
7078415054 88
 
0.4%
7078402802 81
 
0.4%
4812110208 75
 
0.3%
7078400500 75
 
0.3%
7.151415035 × 101073
 
0.3%
5210001005 71
 
0.3%
Other values (8852) 20998
95.9%
ValueCountFrequency (%)
83 4
 
< 0.1%
450 1
 
< 0.1%
741 1
 
< 0.1%
747 11
0.1%
764 2
 
< 0.1%
768 1
 
< 0.1%
825 1
 
< 0.1%
833 1
 
< 0.1%
930 1
 
< 0.1%
940 1
 
< 0.1%
ValueCountFrequency (%)
9.7829 × 10111
< 0.1%
9.78078 × 10111
< 0.1%
9.78032 × 10111
< 0.1%
8.99407003 × 10101
< 0.1%
8.989990005 × 10101
< 0.1%
8.98282002 × 10101
< 0.1%
8.97519001 × 10101
< 0.1%
8.97519001 × 10101
< 0.1%
8.97034002 × 10101
< 0.1%
8.96324001 × 10101
< 0.1%

TxnTime
Categorical

Distinct914
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Memory size171.2 KiB
09-05-2013 10:00
 
132
09-05-2013 17:28
 
100
09-05-2013 16:17
 
98
09-05-2013 17:37
 
97
09-05-2013 12:41
 
96
Other values (909)
21374 

Length

Max length16
Median length16
Mean length16
Min length16

Characters and Unicode

Total characters350352
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique45 ?
Unique (%)0.2%

Sample

1st row09-05-2013 00:27
2nd row09-05-2013 00:27
3rd row09-05-2013 00:27
4th row09-05-2013 00:27
5th row09-05-2013 00:27

Common Values

ValueCountFrequency (%)
09-05-2013 10:00 132
 
0.6%
09-05-2013 17:28 100
 
0.5%
09-05-2013 16:17 98
 
0.4%
09-05-2013 17:37 97
 
0.4%
09-05-2013 12:41 96
 
0.4%
09-05-2013 17:59 96
 
0.4%
09-05-2013 11:57 94
 
0.4%
09-05-2013 14:21 93
 
0.4%
09-05-2013 16:31 92
 
0.4%
09-05-2013 15:07 91
 
0.4%
Other values (904) 20908
95.5%

Length

2023-03-03T22:15:00.013969image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
09-05-2013 21897
50.0%
10:00 132
 
0.3%
17:28 100
 
0.2%
16:17 98
 
0.2%
17:37 97
 
0.2%
12:41 96
 
0.2%
17:59 96
 
0.2%
11:57 94
 
0.2%
14:21 93
 
0.2%
16:31 92
 
0.2%
Other values (905) 20999
47.9%

Most occurring characters

ValueCountFrequency (%)
0 75365
21.5%
1 48300
13.8%
- 43794
12.5%
2 32467
9.3%
5 29673
 
8.5%
3 29395
 
8.4%
9 26035
 
7.4%
21897
 
6.2%
: 21897
 
6.2%
4 8362
 
2.4%
Other values (3) 13167
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 262764
75.0%
Dash Punctuation 43794
 
12.5%
Space Separator 21897
 
6.2%
Other Punctuation 21897
 
6.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 75365
28.7%
1 48300
18.4%
2 32467
12.4%
5 29673
 
11.3%
3 29395
 
11.2%
9 26035
 
9.9%
4 8362
 
3.2%
7 4581
 
1.7%
8 4415
 
1.7%
6 4171
 
1.6%
Dash Punctuation
ValueCountFrequency (%)
- 43794
100.0%
Space Separator
ValueCountFrequency (%)
21897
100.0%
Other Punctuation
ValueCountFrequency (%)
: 21897
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 350352
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 75365
21.5%
1 48300
13.8%
- 43794
12.5%
2 32467
9.3%
5 29673
 
8.5%
3 29395
 
8.4%
9 26035
 
7.4%
21897
 
6.2%
: 21897
 
6.2%
4 8362
 
2.4%
Other values (3) 13167
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 350352
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 75365
21.5%
1 48300
13.8%
- 43794
12.5%
2 32467
9.3%
5 29673
 
8.5%
3 29395
 
8.4%
9 26035
 
7.4%
21897
 
6.2%
: 21897
 
6.2%
4 8362
 
2.4%
Other values (3) 13167
 
3.8%

SaleFlag
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size21.5 KiB
True
14288 
False
7609 
ValueCountFrequency (%)
True 14288
65.3%
False 7609
34.7%
2023-03-03T22:15:00.243275image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantity
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct20
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.242088
Minimum-2
Maximum21
Zeros386
Zeros (%)1.8%
Negative9
Negative (%)< 0.1%
Memory size171.2 KiB
2023-03-03T22:15:00.413636image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum-2
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum21
Range23
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.82540752
Coefficient of variation (CV)0.66453226
Kurtosis75.422994
Mean1.242088
Median Absolute Deviation (MAD)0
Skewness6.4751316
Sum27198
Variance0.68129758
MonotonicityNot monotonic
2023-03-03T22:15:00.610049image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
1 17781
81.2%
2 2841
 
13.0%
3 387
 
1.8%
0 386
 
1.8%
4 295
 
1.3%
5 65
 
0.3%
6 60
 
0.3%
10 25
 
0.1%
8 20
 
0.1%
9 8
 
< 0.1%
Other values (10) 29
 
0.1%
ValueCountFrequency (%)
-2 1
 
< 0.1%
-1 8
 
< 0.1%
0 386
 
1.8%
1 17781
81.2%
2 2841
 
13.0%
3 387
 
1.8%
4 295
 
1.3%
5 65
 
0.3%
6 60
 
0.3%
7 6
 
< 0.1%
ValueCountFrequency (%)
21 1
 
< 0.1%
20 1
 
< 0.1%
15 1
 
< 0.1%
14 3
 
< 0.1%
13 2
 
< 0.1%
12 5
 
< 0.1%
11 1
 
< 0.1%
10 25
0.1%
9 8
 
< 0.1%
8 20
0.1%

Ignore1
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct349
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.016239371
Minimum0
Maximum6.951
Zeros21511
Zeros (%)98.2%
Negative0
Negative (%)0.0%
Memory size171.2 KiB
2023-03-03T22:15:00.813179image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum6.951
Range6.951
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.16718086
Coefficient of variation (CV)10.294787
Kurtosis406.57853
Mean0.016239371
Median Absolute Deviation (MAD)0
Skewness17.096405
Sum355.5935
Variance0.02794944
MonotonicityNot monotonic
2023-03-03T22:15:01.032172image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 21511
98.2%
1.02 4
 
< 0.1%
1 3
 
< 0.1%
0.5896 3
 
< 0.1%
1.01 3
 
< 0.1%
0.7995 3
 
< 0.1%
0.3407 3
 
< 0.1%
0.8704 2
 
< 0.1%
0.8195 2
 
< 0.1%
0.5701 2
 
< 0.1%
Other values (339) 361
 
1.6%
ValueCountFrequency (%)
0 21511
98.2%
0.01 2
 
< 0.1%
0.0102 1
 
< 0.1%
0.0108 1
 
< 0.1%
0.0112 1
 
< 0.1%
0.0116 1
 
< 0.1%
0.0171 1
 
< 0.1%
0.0181 1
 
< 0.1%
0.0186 1
 
< 0.1%
0.0206 1
 
< 0.1%
ValueCountFrequency (%)
6.951 1
< 0.1%
5 1
< 0.1%
4.9499 1
< 0.1%
4.8 1
< 0.1%
4.2016 1
< 0.1%
3.9199 1
< 0.1%
3.9117 1
< 0.1%
3.8498 1
< 0.1%
3.7979 1
< 0.1%
3.6291 1
< 0.1%

Amount
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct676
Distinct (%)3.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.2578413
Minimum-49.99
Maximum174.99
Zeros1365
Zeros (%)6.2%
Negative151
Negative (%)0.7%
Memory size171.2 KiB
2023-03-03T22:15:01.250931image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum-49.99
5-th percentile0
Q11.79
median2.64
Q33.99
95-th percentile7.99
Maximum174.99
Range224.98
Interquartile range (IQR)2.2

Descriptive statistics

Standard deviation3.2872447
Coefficient of variation (CV)1.0090254
Kurtosis369.29381
Mean3.2578413
Median Absolute Deviation (MAD)1.14
Skewness9.357096
Sum71336.95
Variance10.805978
MonotonicityNot monotonic
2023-03-03T22:15:01.641527image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2 1789
 
8.2%
2.5 1411
 
6.4%
0 1365
 
6.2%
1 1163
 
5.3%
2.99 1072
 
4.9%
3 970
 
4.4%
3.99 869
 
4.0%
4.99 478
 
2.2%
2.49 437
 
2.0%
3.49 424
 
1.9%
Other values (666) 11919
54.4%
ValueCountFrequency (%)
-49.99 1
 
< 0.1%
-12.98 1
 
< 0.1%
-8 6
 
< 0.1%
-7.68 1
 
< 0.1%
-7.5 10
 
< 0.1%
-7.31 1
 
< 0.1%
-7.21 1
 
< 0.1%
-7 1
 
< 0.1%
-6.49 48
0.2%
-6.21 1
 
< 0.1%
ValueCountFrequency (%)
174.99 1
< 0.1%
65.96 1
< 0.1%
63 1
< 0.1%
49.99 1
< 0.1%
44.99 1
< 0.1%
40.47 1
< 0.1%
38.46 1
< 0.1%
37.47 1
< 0.1%
35.97 2
< 0.1%
34.99 1
< 0.1%

Ignore3
Real number (ℝ)

Distinct2862
Distinct (%)13.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean243943.9
Minimum20003
Maximum870123
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size171.2 KiB
2023-03-03T22:15:01.875866image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum20003
5-th percentile20055
Q140054
median130039
Q3160033
95-th percentile860059
Maximum870123
Range850120
Interquartile range (IQR)119979

Descriptive statistics

Standard deviation316861.96
Coefficient of variation (CV)1.2989132
Kurtosis-0.2049935
Mean243943.9
Median Absolute Deviation (MAD)89945
Skewness1.2907514
Sum5.3416396 × 109
Variance1.004015 × 1011
MonotonicityNot monotonic
2023-03-03T22:15:02.094625image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
130039 85
 
0.4%
160019 81
 
0.4%
40171 76
 
0.3%
140070 69
 
0.3%
20005 69
 
0.3%
30045 67
 
0.3%
20030 64
 
0.3%
150015 64
 
0.3%
40060 63
 
0.3%
40007 62
 
0.3%
Other values (2852) 21197
96.8%
ValueCountFrequency (%)
20003 2
 
< 0.1%
20004 3
 
< 0.1%
20005 69
0.3%
20006 23
 
0.1%
20007 13
 
0.1%
20008 13
 
0.1%
20009 3
 
< 0.1%
20010 24
 
0.1%
20011 23
 
0.1%
20012 10
 
< 0.1%
ValueCountFrequency (%)
870123 1
 
< 0.1%
870122 2
 
< 0.1%
870121 7
 
< 0.1%
870120 27
0.1%
870119 8
 
< 0.1%
870118 10
 
< 0.1%
870117 1
 
< 0.1%
870116 20
0.1%
870115 2
 
< 0.1%
870114 1
 
< 0.1%

Ignore4
Real number (ℝ)

Distinct8862
Distinct (%)40.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean229083
Minimum3933
Maximum924909
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size171.2 KiB
2023-03-03T22:15:02.297733image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum3933
5-th percentile12747.6
Q118222
median34329
Q3415483
95-th percentile897163
Maximum924909
Range920976
Interquartile range (IQR)397261

Descriptive statistics

Standard deviation343459.55
Coefficient of variation (CV)1.49928
Kurtosis-0.37951513
Mean229083
Median Absolute Deviation (MAD)18433
Skewness1.2233797
Sum5.0162305 × 109
Variance1.1796447 × 1011
MonotonicityNot monotonic
2023-03-03T22:15:02.516467image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18150 115
 
0.5%
34329 115
 
0.5%
34331 111
 
0.5%
34330 95
 
0.4%
28730 88
 
0.4%
49882 81
 
0.4%
35233 75
 
0.3%
14896 75
 
0.3%
49891 73
 
0.3%
25555 71
 
0.3%
Other values (8852) 20998
95.9%
ValueCountFrequency (%)
3933 2
< 0.1%
3985 1
< 0.1%
4087 2
< 0.1%
4509 1
< 0.1%
4610 2
< 0.1%
4796 1
< 0.1%
4904 1
< 0.1%
4905 1
< 0.1%
4936 1
< 0.1%
4946 1
< 0.1%
ValueCountFrequency (%)
924909 1
 
< 0.1%
923866 1
 
< 0.1%
923539 1
 
< 0.1%
923393 1
 
< 0.1%
923392 2
< 0.1%
923391 1
 
< 0.1%
923357 4
< 0.1%
923355 1
 
< 0.1%
923349 1
 
< 0.1%
923342 2
< 0.1%

Ignore5
Real number (ℝ)

HIGH CORRELATION
SKEWED

Distinct636
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.0504667
Minimum-49.99
Maximum349.99
Zeros0
Zeros (%)0.0%
Negative9
Negative (%)< 0.1%
Memory size171.2 KiB
2023-03-03T22:15:02.735200image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum-49.99
5-th percentile1.19
Q12.27
median3.29
Q34.49
95-th percentile9.164
Maximum349.99
Range399.98
Interquartile range (IQR)2.22

Descriptive statistics

Standard deviation4.0523761
Coefficient of variation (CV)1.0004714
Kurtosis2445.2191
Mean4.0504667
Median Absolute Deviation (MAD)1.1
Skewness30.78199
Sum88693.07
Variance16.421752
MonotonicityNot monotonic
2023-03-03T22:15:02.953936image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.99 1185
 
5.4%
2.99 1098
 
5.0%
2.69 976
 
4.5%
1.99 875
 
4.0%
2.19 845
 
3.9%
3.49 815
 
3.7%
2.79 757
 
3.5%
3.69 680
 
3.1%
2.49 647
 
3.0%
3.79 612
 
2.8%
Other values (626) 13407
61.2%
ValueCountFrequency (%)
-49.99 1
 
< 0.1%
-12.98 1
 
< 0.1%
-4.99 3
< 0.1%
-2.99 1
 
< 0.1%
-2.5 1
 
< 0.1%
-1.39 2
 
< 0.1%
0.02 5
< 0.1%
0.03 2
 
< 0.1%
0.04 2
 
< 0.1%
0.05 3
< 0.1%
ValueCountFrequency (%)
349.99 1
< 0.1%
73.96 1
< 0.1%
69.09 1
< 0.1%
54.4 1
< 0.1%
53.37 1
< 0.1%
49.99 2
< 0.1%
44.97 2
< 0.1%
43.96 1
< 0.1%
41.98 2
< 0.1%
41.97 1
< 0.1%

Ignore6
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size21.5 KiB
True
14182 
False
7715 
ValueCountFrequency (%)
True 14182
64.8%
False 7715
35.2%
2023-03-03T22:15:03.172652image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Ignore7
Boolean

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size21.5 KiB
False
21897 
ValueCountFrequency (%)
False 21897
100.0%
2023-03-03T22:15:03.344534image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

SegmentCode
Real number (ℝ)

Distinct1166
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1553.5355
Minimum3
Maximum4495
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size171.2 KiB
2023-03-03T22:15:03.516376image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile95
Q1562
median919
Q32798
95-th percentile3468
Maximum4495
Range4492
Interquartile range (IQR)2236

Descriptive statistics

Standard deviation1246.2879
Coefficient of variation (CV)0.80222685
Kurtosis-1.3416954
Mean1553.5355
Median Absolute Deviation (MAD)631
Skewness0.50585432
Sum34017767
Variance1553233.5
MonotonicityNot monotonic
2023-03-03T22:15:03.750735image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
772 325
 
1.5%
723 303
 
1.4%
484 236
 
1.1%
778 232
 
1.1%
2736 222
 
1.0%
459 221
 
1.0%
129 213
 
1.0%
4478 191
 
0.9%
2798 188
 
0.9%
749 188
 
0.9%
Other values (1156) 19578
89.4%
ValueCountFrequency (%)
3 48
0.2%
4 1
 
< 0.1%
5 28
0.1%
6 4
 
< 0.1%
7 36
0.2%
8 12
 
0.1%
10 2
 
< 0.1%
11 7
 
< 0.1%
12 1
 
< 0.1%
13 1
 
< 0.1%
ValueCountFrequency (%)
4495 2
 
< 0.1%
4493 1
 
< 0.1%
4492 5
 
< 0.1%
4490 4
 
< 0.1%
4485 2
 
< 0.1%
4480 2
 
< 0.1%
4479 3
 
< 0.1%
4478 191
0.9%
3652 1
 
< 0.1%
3651 5
 
< 0.1%

Interactions

2023-03-03T22:14:56.055191image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:44.704898image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:46.736406image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:48.233095image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:49.802992image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:51.228526image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:52.693893image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:54.464422image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:56.237673image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:45.244657image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:46.921047image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:48.443686image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:49.971949image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:51.404948image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:53.055505image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:54.700913image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:56.403144image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:45.517762image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:47.103911image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:48.642774image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:50.151062image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:51.593212image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:53.234236image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:54.878513image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:56.601587image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:45.745654image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:47.289901image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:48.842343image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:50.339041image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:51.778919image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:53.419456image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:55.090756image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:56.765175image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:45.927815image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:47.476729image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:49.026811image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:50.498566image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:51.957828image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:53.665181image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:55.276246image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:56.952723image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:46.143433image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:47.649259image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:49.222819image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:50.667892image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:52.136567image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:53.837344image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:55.458137image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:57.146188image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:46.342304image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:47.814449image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:49.391616image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:50.847379image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:52.312186image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:54.000336image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:55.651743image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:57.346263image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:46.554934image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:48.050646image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:49.607133image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:51.046139image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:52.508657image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:54.216141image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-03-03T22:14:55.869823image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Correlations

2023-03-03T22:15:04.063213image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2023-03-03T22:15:04.360087image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2023-03-03T22:15:04.625671image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2023-03-03T22:15:04.891301image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2023-03-03T22:15:05.127064image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2023-03-03T22:15:05.314530image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2023-03-03T22:14:57.625692image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-03-03T22:14:58.031660image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

StoreNumberItemCodeTxnTimeSaleFlagQuantityIgnore1AmountIgnore3Ignore4Ignore5Ignore6Ignore7SegmentCode
01088.410580e+1009-05-2013 00:27N10.011.7984020089626211.79NN1002
11088.947000e+1009-05-2013 00:27N10.01.298402008321021.29NN778
21082.840016e+0909-05-2013 00:27Y20.07.008402008939438.58YN1071
31082.840016e+0909-05-2013 00:27Y20.06.008402008939526.98YN1053
41084.127102e+0909-05-2013 00:27N10.03.698402008941073.69NN3541
51082.363742e+0909-05-2013 00:27N10.03.998402009185803.99NN3418
61088.182900e+1009-05-2013 00:27N20.02.588402009176192.58NN775
71088.182900e+1009-05-2013 00:27N10.01.298402009176201.29NN778
81088.182900e+1009-05-2013 00:27N20.02.588402009176212.58NN778
91082.100001e+0909-05-2013 00:27N20.02.38840200237952.38NN2751
StoreNumberItemCodeTxnTimeSaleFlagQuantityIgnore1AmountIgnore3Ignore4Ignore5Ignore6Ignore7SegmentCode
218872337.880011e+0909-05-2013 23:42Y10.01.0080058555141.49YN754
218882337.078400e+0909-05-2013 23:42Y10.02.99800584443653.49YN3163
218892334.900005e+0909-05-2013 23:42Y10.01.0080058362991.19YN1307
218902333.656328e+1009-05-2013 23:50Y10.00.8880059188001.59YN906
218912331.200011e+0909-05-2013 23:52Y20.03.00800608971875.58YN1296
218922336.337120e+1009-05-2013 23:52N10.07.99800608443397.99NN2410
218932336.166761e+1009-05-2013 23:54Y10.02.5080061203112.99YN3248
218942337.084700e+0909-05-2013 23:54Y10.01.75800615910192.49YN1296
218952331.708200e+0909-05-2013 23:54N10.01.9980061534011.99NN1076
218962337.148102e+0909-05-2013 23:55N10.03.7980062150763.79NN734